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The notion of two-way automata was introduced at the very beginning of automata theory. In 1959, 
Rabin and Scott [31 1 and, independently, Shepherdson [35 1, proved that these models, both in the de- 
terministic and in the nondeterministic versions, have the same power of one-way automata, namely, 
they characterize the class of regular languages. In 1978, Sakoda and Sipser 11321 posed the question 
of the cost, in the number of the states, of the simulation of one-way and two-way nondeterministic 
automata by two-way deterministic automata. They conjectured that these costs are exponential. In 
spite of all attempts to solve it, this question is still open. In the last ten years the problem of Sakoda 
and Sipser was widely reconsidered and many new results related to it have been obtained. In this 
work we discuss some of them. In particular, we focus on the restriction to the unary case and on the 
connections with open questions in space complexity. 



1 Introduction and Preliminaries 



Finite state automata are usually presented as devices which are able to recognize input strings using a 
fixed amount of memory, implemented using a finite state control (see, e.g., H3). The input string is 
written on a read-only tape, which is scanned by an input head. In the basic model the input head is 
moved only from left to right. For this reason the model is also called one-way finite automaton. It can 
be defined in the deterministic and the nondeterministic versions (lDFA and lNFA, respectively). It is 
well known that both of them share the same recognition power, i.e., they characterize the class of regular 
languages. However, nondeterministic finite automata can be exponentially smaller. In fact, each ra-state 
lNFA can be simulated by an equivalent lDFA with 2" states and this cost cannot be reduced Il25ll28ll30l . 
What happens if we allow to move the input head in both directions? 

In spite of this additional feature, the resulting models, which are called two-way finite automata, 
have the same computational power as one-way automata, i.e., they still characterize the class of regular 
languages, as independently proved by Rabin and Scott OTI and by Shepherdson l35l . at the beginning 
of automata theory. However, from the point of view of the size (measured in terms of states) the situation 
is different. We still do not have a complete picture of the relationships between the sizes of different 
variants of finite automata. 

By an analysis of the constructions given in ||3T1 l35l . it turns out that the simulations of «-state 
two-way nondeterministic finite automata (2nfas, for short) and rc-state two-way deterministic finite 
automata (2dfas, for short) by lDFAs can be done with a number of states exponential in a polynomial 
in n. Furthermore, a lower bound exponential in n follows from the simulation of lNFAs by lDFAs. The 
exact bound for the simulation of 2NFAs by lNFAs has been found in ifTTl . 

The costs of the simulations of lNFAs by 2DFAs and of 2NFAs by 2DFAs are still unknown. The 
problem of stating them was raised in 1978 by Sakoda and Sipser |[32l . with the conjecture that they are 
not polynomial. In spite of all attempts to solve it, this problem is still open. 
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In the last decade several new results related to the Sakoda and Sipser question have been discovered. 
In this paper we discuss some of them (mainly with respect to the question of 2NFAs versus 2DFAs) 
besides some older results in this area. 

Technical Issues 

We will keep the presentation at an informal level, trying to avoid, as much as possible, technical details. 
For this reason we do not give a formal definition of the main model we are interested in, but we just 
present an informal description. 

We assume that the reader is familiar with standard notions concerning finite state automata, as 
presented for instance in lfT2l . We denote by E the input alphabet, by E* the set of all strings over E, 
and by E" the set of strings of length n, where n > is an integer. The length of a string w6l* will be 
denoted by |w|. 

A computation of a one-way automaton starts on the leftmost input symbol in the initial state; at 
each step the input head is moved one position to the right; the computation ends immediately after the 
execution of the move which reads the rightmost input symbol. For two-way automata slightly different 
definitions are given in the literature. We skip technical details and we emphasize the main features. 

• First of all, we assume that the input string is surrounded on the input tape by two special symbols, 
h, E, called, respectively, the left and the right endmarker. Hence, if the input is w G £*, then 
the input tape contains h w H. 

• To present recognition algorithms, sometimes we need to number input cells. So, we assume 
that on input w the cells are numbered from to \w\ + 1, where cells and \w\ + 1 contain the 
endmarkers, and the remaining cells contain "real" input symbols. The input head cannot violate 
the endmarkers. 

• The computation starts in a designed initial state with the head scanning the first "real" input 
symbol, i.e., on cell 1. Sometimes it is more convenient to start from cell 0. It should be clear that 
this does not significantly change the model. 

• To reflect the acceptance condition for one-way automata, we can stipulate that a string is accepted 
by a two-way automaton if and only if there is a computation which reaches the right endmarker 
in a final state. However, this condition can be slightly modified by considering acceptance on the 
left endmarker or just on one endmarker. 

A different possibility is to state that a string is accepted if and only if there is a computation which 
reaches a final state, regardless the input head position. 

Further variants are possible. It should be clear that all these variants are equivalent. Adding 
one or two states, we can easily convert a two-way automaton with an acceptance condition into 
another one with a different acceptance condition. For this reason, here we do not fix any particular 
acceptance condition. 

• The transition function can be defined by allowing only moves to the left and to the right or even 
allowing stationary moves, i.e., transitions that keep the head on the same input cell. Even this 
possibility does not significantly change the model and the number of states. 

• We point out that a two-way automaton can enter into a loop. In this case the computation is 
rejecting. 
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• When we say that a two-way automaton A has /(n)+... states, we mean that A has f(n) +c states, 
where c is a small constant (in all examples c < 10 is enough). This constant can slightly change 
depending on the choice of the initial configuration, of the acceptance condition, and of the possi- 
bility of stationary moves. 

• An head reversal is any change of the input head direction, i.e., a two-way automaton makes one 
head reversal when after a sequence of transitions moving the head to the right it make a transition 
moving the head to the left or vice versa. Stationary moves are not taken into account to compute 
head reversals. For instance a sequence of two moves to the right, one stationary move, one move 
to the right, one stationary move again and one move to the left contains just one head reversal. 

2 Two Examples 

Let us start by considering the following family of languages 

I n = (a + b)*a{a + b) n - 1 , 

namely, for each integer n > 0, I n is the set of strings whose nth symbol from the right is an a. This is 
a classical example used to present the optimality of the subset construction (actually, this very simple 
example does not achieve exactly the optimality, but it is very close to it). In particular, for each n > 1, 
we can prove the following: 

• The language /„ is accepted by the lNFA with n + 1 states in Figure [T] 




Figure 1: A lNFA accepting the language/,, = {a + b)* a(a + b) n 



• Each lDFA accepting /„ requires 2" states. Intuitively, this can be proved by observing that in order 
to accept the language /„, a lDFA needs to remember the last n input symbols. It is a standard 
exercise to depict a 1 DFA matching this lower bound. 

• The language /„ is accepted by a 2dfa with n+ states which reverses its input head just one time 
during each computation. The automaton, firstly scans the input from left to right, only to reach 
the right endmarker. Then it moves n positions to the left, finally checking whether or not the 
reached input cell contains the symbol a. 

This simple example emphasizes that the possibility of moving the input head in both directions can 
drammatically reduce the size of deterministic automata. In particular, in this case one reversal is enough 
to reduce an automaton of exponential size in n to an automaton of linear size. 

We can also observe that the language /„ is accepted by a lNFA and a 2DFA having approximatively 
the same size. So the example could suggest the possibility of replacing the nondeterminism in one-way 
automata by two-way motion. 
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We now present a more elaborated variant of this example which will be also useful to discuss some 
restricted versions of two-way automata considered in the literature. For each n > 0, let us consider the 
language 

L„ = (a + b)*a(a + b) n ^a(a + b)* . 

In this case we ask that each string in the language contains two letters a's with n — 1 symbols in between. 
The language L n can be easily accepted by the lNFA with n + 2 states in Figure [2] 




Figure 2: A lNFA accepting the language L„ = (a + b)* a(a + b) n l a(a + b)*. 



What about acceptance of L„ by one-way and two-way deterministic automata? 
Let us start by studying acceptance in the one-way case. The idea is very similar to the one outlined 
for the language I n . 

We can build an automaton A„ which remembers in its final control the last n input symbols. Hence, 
when in the state corresponding to Oi 02 ■ ■ ■ o n a new input symbol 7 is read, the automaton moves to the 
state corresponding to Gi . . . o n y. However, in the case Oy = 7 = a the automaton moves to its only final 
state, where it loops on each input symbol. In Figure [3] the automaton A3 accepting the language L3 is 
represented. Notice that with this strategy the resulting 1 DFA has 2" + 1 states. 




Figure 3: The lDFA A3 accepting the language L3 = (a + b)*a(a + b) 2 a(a + b)*. 

We can show that each automaton A n is minimal. This can be done by using classical distinguishability 
arguments (see, e.g., |[T2l ") along the following lines: 
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• Each two pairwise different strings x,y of length n are distinguishable. To prove this it is enough to 
consider the string b'~ l a, where i, 1 < i < n, is the index of the leftmost letter different in x and y, 
and to verify that exactly one string between xb'~ l a and yb'~ l a belongs to L n . 

• Each string of length n does not belong to L n and, hence, it is distinguishable from a n+l which 
belongs to L n . 

• Hence, the 2 n + 1 strings in the set Z" U {a n+l } are pairwise distinguishable for L n . As a conse- 
quence, 2" + 1 is a lower bound for the number of states of each lDFA accepting L n . This lower 
bound matches the number of the states of the automaton A n above described. 

Now, we discuss a different strategy to accept L n using a two-way automaton. In the following let 
w = w\W2 ■ --Win, with Wi 6 {a,b}, i = 1, . . . ,m, m > 0, be the input string for which we want to check 
the membership to L n . 

(i) Naif algorithm 

To decide whether a string w G L* belongs to L, for i = 1 , . . . , |w| — n we check if both symbols in 
positions i and i+n are a's. The input is accepted if for at least one i the condition is satisfied. This 
algorithm can be implemented by a 2d FA that to move from position i to position i + n counts n 
positions forward, and then counts n — l positions backward to reach position i + 1 . Furthermore, 
when moving from position i to position i + n, the automaton needs to remember whether or not 
the symbol in position i is a. This leads to a 2dfa with 0(n) states which moves the input head 
along a zig-zag trajectory. 

(ii) An improved algorithm 

It is immediate to observe that the naif algorithm can be improved. First, when the symbol w, is 
b, we do not need to inspect the symbol w !+ „. Second, when a position i is found such that both 
symbols w; and are a's, the automaton can accept without checking the remaining positions. 
This leads to an algorithm which uses no more than 2n+... states. 

(iii) A different strategy: head reversals only at the endmarkers 

We can describe a different algorithm to recognize L n , which is implemented by a 2dfa performing 
head reversal only when the input head is visiting the endmarkers. Hence, in this algorithm a 
computation is a sequence of left-to-right and right-to-left traversals of the input string, which are 
also called sweeps. 

We give an informal description of the algorithm: 

• The automaton performs at most n sweeps from left to right, interleaved with sweeps from 
right to left. 

• In the ith sweep from left to right, 1 < i < n, the automaton starting from the cell i, inspects 
the contents of cells i, i + n, i + 2-n, i + 3-n, ... , in order to check if two of them which 
are consecutive in this list (i.e., cells i + j -n and i + (j + 1) • n, for some j > 0) contain the 
symbol a. If this happens then the automaton stops and accepts. 

To locate the cells that must be inspected, a counter modulo n is kept in the finite control. 
This counter can be implemented using n states. However, the automaton needs to remember 
the content of the last inspected cell. This doubles the number of the states. 

• When in the ith scan from left to right, the right endmarker is reached, there are two possi- 
bilities. If i < n then the automaton makes a sweep from right to left, in order to prepare the 
(i + l)fh scan from left to right. If i = n then the automaton stops and rejects. 
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This strategy can be implemented with 0(n 2 ) states, by keeping track in the finite control of the 
counter i, and by using 2n states for each sweep from left to right, and just one state for each sweep 
from right to left. 

We can reduce the number of states to 0[n) by avoiding to store the counter i for sweeps. To 
this aim, also during sweeps from right to left we count the input length modulo n, by introducing 
another counter c^. After the ith sweep from left to right, the sweep from right to left starts by 
assigning to the counter c<_ a value which depends on the current value of . In this way, at the 
end of the traversal from right to left, when the left endmarker is reached again, from the value 
of c<_ it is possible to reconstruct the value of i, in order to prepare the next sweep. 

3 Restricted Models 

We now briefly present and discuss some restricted variants of two-way automata that have been consid- 
ered in the literature. 

Oblivious Automata 

In the naif algorithm (i) we described to recognize language L n , we can observe that for all the inputs of 
the same length m the "trajectory" of the head during the computation is the same, i.e., the position of 
the input head at the time t does not depend on the input content, but only on its length. A 2dfa with 
this property is called oblivious. 

Sweeping Automata 

A two-way automaton performing head reversal only when the input head is visiting the endmarkers is 
called sweeping automaton. This notion has been studied by Sipser ll36l . In particular, for the language 
L„ above described, the recognition strategy (iii) is based on a sweeping 2DFA. 

Rotating Automata 

The method (iii) suggests another model, called rotating automata [21 ], which now we briefly mention. 
A computation of a rotating automaton is a sequence of left-to-right scans of the input. In particular, when 
the right end of the input is reached, the computation continues on the leftmost input symbol. In other 
words, we can imagine the input tape as circular, with a special cell containing a marker and connecting 
the end with the beginning of the tape. With a trivial transformation which doubles the number of the 
states, each rotating automaton can be transformed into an equivalent sweeping automaton. 

The reader can verify that languages /„ and L n can be accepted by rotating automata with 0{n) states. 

Outer Nondeterministic Automata 

All the above mentioned models are defined by restricting the movement of the input head. A differ- 
ent kind of restriction has been recently considered in (HE!, by introducing outer nondeterministic 
automata (20FAs). In these models nondeterministic choices can be taken only when the input head is 
scanning the endmarkers. Hence, the transition on "real" input symbols are deterministic. This model 
does not have any restriction on head reversals, i.e., 2ofas can change the direction of the input head at 
each position. 
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The deterministic algorithm (iii) for accepting L„ can be easily transformed in an algorithm for a 
(degenerate) outer nondeterministic automaton. At the first step the automaton guesses an integer i, with 
1 < i < n, and then it simulates the ith sweep from left to right described in algorithm (iii), rejecting if the 
right endmarker is reached without finding two cells i + j-n and i + (y + 1) ■ n, both containing the symbol 
a. This can be implemented just choosing the initial value of the counter c^ in a nondeterministic way, 
at the beginning of the computation with the head on the left endmarker. 

Few Reversal Automata 

All the models above discussed are defined by introducing structural restrictions. In the next model the 
restriction is of a different kind. On each computation we count the number of reversals of the input head 
during the computation. A 2DFA is said to be few reversals if the number of head reversals is sublinear 
with respect to the input length, i.e., it is o(m), where m is the length of the input. It has been recently 
proved that a 2dfa with o(m) reversals is actually a 2DFA with 0(1) reversals, i.e., each few reversal 
2dfa can make only a number of reversals which is ultimately bounded by a constant |[T5l . 

Notice that the algorithm (i) above described clearly uses a number of reversals which is linear in the 
length of the input. Even the algorithm (ii) uses a linear number of reversals (consider, e.g., inputs of the 
form a n b n a"b n . . . a n b"). On the other hand, in the algorithm (iii) the number of reversals is bounded by 
2n — 1 , which is a constant with respect to the input length. 

In the nondeterministic case we can have several computations for a same input string. For this 
reason we can measure head reversals in different ways. For example, we can consider reversals in all 
computations, or only in all accepting computations, or just in one accepting computation. This can lead 
to different notions of few reversal 2nfas (something similar is well known in space complexity, where 
different space notions have been considered, see, e.g., ll26l ). 

Unambiguous Automata 

This is a well known classical notion: a nondeterministic automaton is unambiguous if and only if 
for each input string there is at most one accepting computation. While the lNFA above described 
to recognize /„ is unambiguous, it can be easily seen that the lNFA A n accepting L n can have many 
accepting computation for a same input string, i.e., it is ambiguous. 

4 Restrictions on the Simulating Machines 

As already mentioned in the introduction, the Sakoda and Sipser question asks the costs, in states, of 
the simulations of lNFAs and 2nfas by 2DFAs. Separations have been obtained by considering restric- 
tions on the target machines. In particular, the simulations of «-state lNFAs (and hence also 2NFAs) by 
sweeping, oblivious, and few reversal automata require exponentially many states^ 

Note that all above restrictions are related to the movement of the input head. 

However, these results do not solve the general problem. In fact, it has been also proved that the 
simulations of (unrestricted) 2DFAs by these restricted models require exponentially many states. See 
Figure[4]for a summary of these and other separations. Their proofs use rather involved arguments. 

Concerning few reversals 2dfas, we already mentioned that a o(n) upper bound on reversals implies 
a 0(1) upper bound lfT5l. We can also compare the size of 2DFAs making a fixed numbers of reversals. 
For example, we observed that the language /„ is accepted by a 2dfa with «+.., states that makes only 
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Figure 4: An arrow from a class A of machines to a class B denotes an exponential separation, i.e., the 
state cost of the simulation of machines in the class A by machines in the class B can be exponential. 
A dashed arrow indicates the existence of a polynomial simulation. The conversions corresponding to 
arrows marked (a) and (b) can be easily obtained by squaring the number of the states, (c) derives 
from (b) and (d). The (trivial) dashed arrow from oblivious, sweeping, and few reversal automata to 
2dfas are not depicted. 



one reversal, while each lDFA (i.e., each 2DFA making reversals) needs 2" states to accept it. Hence, 
2DFAs making reversals can be exponentially larger than 2dfas making 1 reversal. 
What about 2DFAs making k versus 2DFAs making k+l, for k > 0? 

In the case k = 1 this question has been solved by Balcerzak and Nivihski LI J, by proving an exponen- 
tial separation. Recently Kapoutisis and Pighizzini extended this separation to each integer k, providing 
an infinite reversal hierarchy of 2dfas lPT5l . It should be interesting to investigate similar questions in 
the nondeterministic case. 



5 The Case of Unary Languages 

Unary languages are defined over a one letter alphabet E. In the following we stipulate £ = {a}. 

The state costs of the optimal simulations between different variant of unary automata have been 
obtained by Chrobak [4] and by Mereghetti and Pighizzini [27] and are summarized in Figure[5] 

From the picture we can observe that the cost of the optimal simulations in the unary case can be 
smaller than in the general case. For example the cost of the simulation of rc-state 1 NFAs reduces from 2" 
to g (v / «-in«)_ Q u it e surprisingly, eliminating at the same time both nondeterminism and two-way motion 

'A stronger separation can be given by considering the degree of non-obliviousness, that counts the number of different 
trajectories of the head on inputs of the same length. Hence, a 2DFA has a sublinear degree of non-obliviousness if and only if 
the number of different trajectories on inputs of length n is o(n). In 1131 it was proven that the simulation of lNFAs by 2DFAs 
with a sublinear degree of non-obliviousness requires exponentially many states. 
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Figure 5: Costs of the optimal simulations between different kinds of unary automata. An arc labeled 
f(n) from a vertex x to a vertex y means that a unary n-state automaton in the class x can be simu- 
lated by an /(n)-state automaton in the class y. The e ®W n Xnn ) costs for the simulations of lNFAs and 
2DFAs by lDFAs as well the cost &(n 2 ) for the simulation of lNFAs by 2dfas have been proved in 0. 
The e (v'n inn) cos t for the simulation of 2NFAs and lDFAs has been proved in [271 . The other e®(v / " ? M 
costs are easy consequences. All the n costs are trivial. The arc labeled "?" represents the open question 
of Sakoda and Sipser. 



costs as eliminating only one of them. 

The question lNFAs versus 2dfas has been solved in the unary case in [4] by showing that the tight 
cost is polynomial, more precisely &(n 2 ). This gives also the best known lower bound for the general 
case. 

In spite the unary case looks simpler than the general one, the question of 2NFAs versus 2DFAs not 
only is still open even in this case, but it seems also to be difficult and, at the same time, very challenging. 
We will now discuss its status. 

Normal Forms for Unary Nondeterministic Automata 

The "simplicity" of automata over a unary alphabet, with respect to automata over a general alphabet, 
allows to give normal forms for unary lNFAs and 2nfas. These forms, at the price of a small increasing 
in the number of the states, strongly restrict the use of nondeterminism and head reversals. 

For the one-way case we mention the Chrobak normal form J4|. In this form the transition graph 
of the automaton consists of a deterministic path from the initial state to a state q, together with k > 
deterministic loops. From the state q there are k outgoing edges, each one of them connects q to exactly 
one state in each of the k loops. Hence, a lNFA in this form is allowed to make in its computation 
at most one nondeterministic choice, when it is in the state q. A degenerate case of lNFA in Chrobak 
normal form is an automaton whose transition graph consists exactly of one deterministic loop, without 
the initial path. Each «-state unary 1 NFA can be converted into an equivalent one in Chrobak normal 
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form with no more than n 2 states in the initial path and n states in the loops. Hence the conversion does 
not significantly increase the number of the states]^] 

A generalization of the Chrobak normal form to the two-way case has been obtained by Geffert, 
Mereghetti, and Pighizzini [9]. In order to present it, it is useful to relax the notion of equivalence 
between automata, by allowing a finite number of "errors". More precisely, two finite automata are 
said to be almost equivalent if the symmetric difference of their accepted languages is finite, i.e., the 
languages accepted by the two automata coincide expect for a finite number of strings. 

Theorem 5.1 (|9 |) Each n-state unary 2NFA A can be transformed into an almost equivalent 2NFA M 
such that 

• M is quasi-sweeping, namely, head reversals and nondeterministic choices are possible only when 
the head is scanning the endmarkers^ 

• M has at most 2n + 2 states, 

• the languages accepted by A and M can differ only on strings of length at most 5n 2 . 



An inspection to the proof of Theorem 5 . 1 shows that M and its computations have a very simple structure 
(see also ifTTTO . In particular, in each traversal of the input M uses a deterministic loop to count the input 
length modulo one integer. 

The 2nfa M can be easily turned into an automaton "fully" equivalent to the original 2nfa A, by 
adding 5n 2 +... states, used to fix, in a preliminary scan of the input, the "errors". 

We point out that for unary 2dfas a similar normal form has been obtained in |[23l . 



The normal form in Theorem 5.1 gives a strong simplification of unary 2nfas which has been an 
important tool to prove several results on unary 2nfas. First of all, it has been used in [9] to prove a 
subexponential, but still superpolynomial upper bound for the conversion of unary 2NFAs into equiva- 
lent 2dfas: 

Theorem 5.2 (|9 |) Each unary n-state 2NFA can be simulated by a 2DFA with e°( ln2 ") states. 

It is interesting to discuss the main idea in the proof of this result. Suppose the given n-state 2NFA A 



is already in the normal form of Theorem 5.1 We can observe that if an accepting computation C visits 
the left endmarker more than n times, then there exists a shorter accepting computation C' on the same 
input. In fact, in C at least a same state q must be visited twice with the head at the left endmarker and 
so the computation C' can be obtained by cutting the part of C between the two repetitions. Hence, if we 
assume acceptance on the left endmarker, to detect if an input string is accepted it is enough to check the 
existence of a computation starting in the initial state with the head on the left endmarker, ending in a 
final state with the head on the same endmarker, and visiting the left endmarker at most n times. 

To this aim we can introduce a predicate reachable (p,q,k) which holds true exactly when there is a 
path starting in the state p on the left endmarker, ending in the state q on the same endmarker and visiting 
it at most k times. This predicate can be recursively computed using a divide-and-conquere technique. 
The implementation of the resulting procedure leads to a 2dfa with e°( lir ") states. 



2 Besides (4) , we refer the reader to 1 6 . 7 , 34 1 . All these papers present different algorithms and techniques for the conversion 
of unary 1 NFAs into Chrobak normal form. 

3 In 1361 the term sweeping was introduced for deterministic automata making head reversals only at the endmarkers. It is 
natural to extend this notion to the nondeterministic case, to denote 2NFAs making head reversals also at the endmarkers. In 
this case we have a further restriction: even nondeterministic decisions can taken only when the input head is scanning the 
endmarkers, not on "real" input symbols. 
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In the case the given automaton is not in normal form, we first convert it into an almost equivalent 
2nfa in normal form and then we apply the above procedure to the resulting automaton. Finally, with a 
small modification which does not increase the state upper bound, we can fix the "errors", i.e., we can 
manage strings of length < 5n 2 , in order to obtain a 2DFA fully equivalent to the original 2nfa. 



The upper bound in Theorem |5.2| is subexponential, in the sense that it grows less than the exponential 
function e", but it is superpolynomial, in fact it grows faster than any polyomial. 

The natural question is investigating whether or not it is tight. At the moment we do not have an 
answer to it. However, the question is related to the relationship between deterministic and nondetermin- 
istic logarithmic space. The discussion of this point is postponed to the next section. 



The normal form in Theorem 5.1 has been used to prove other interesting properties of unary 2nfas. 
Among them: 

• Each unary rc-state 2nfa accepting a language L can be transformed into a 2nfa with 0(n 8 ) states 
accepting the complement of L iflOl . 

• Each unary rc-state 2nfa can be transformed into an equivalent unambiguous 2nfa with a number 
of states polynomial in n |[TT1l . 

The proof of the first result is given by using an inductive counting technique. The second result was 
obtained adapting one of constructions discussed in the next section (in particular, the construction used 
to prove Lemma |6TTj). 



6 Relationships with the L versus NL Question 

Interesting connections between the question of Sakoda and Sipser and the open question of the relation- 
ship between the classes of languages accepted in logarithmic space by deterministic and nondetermin- 
istic Turing machines (denoted by L and NL, respectively) have been obtained. In this section we will 
briefly discuss them. 

(i) First of all, Berman and Lingas [3] proved that if L = NL then for each «-state 2NFA A with an 
input alphabet of a symbols there exists a 2nfa B with a number of states polynomial in n and a 
which agrees with A on strings of length at most n. Hence L = N L implies a polynomial simulation 
of 2nfas by 2dfas on "short" inputs. 

This result was recently improved along the following lines. 

(ii) Geffert and Pighizzini [11] considered the unary case. They proved that L = NL would imply a 
polynomial simulation of unary 2NFAs by 2dfas|^] Compared with condition (i), we can observe 
that while only devices with a unary input alphabet are considered here, the restriction on the 
length of the inputs is removed. 

This result shows the relevance of the unary case. In fact, proving the optimality of the bound in 



Theorem 5.2 or even proving a smaller but still superpolynomial lower bound for the simulation 
of unary 2nfas by 2DFAs would imply the separation of L and NL. 

(iii) Kapoutsis |[T9l generalized the condition (i) by proving that L/poly 5 NL if and only if for each 
«-state 2nfa A with an input alphabet of a symbols there exists a 2NFA B with a number of states 
polynomial in n which agrees with A on strings of length at most n, where L/ poly denotes the class 



4 The restriction to the unary case concerns only two-way automata, not the classes L and NL. 
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of languages accepted by deterministic logspace bounded machines that can access a polynomial 
advice \22 Hence L/poly 5 NL is equivalent to the existence of a state polynomial simulation of 
2nfas by 2d FAs on "short" inputs. Since L/ poly D L and L C NL, the only-if condition is stronger 
than the condition (i). Furthermore, in this case the converse also holds. 

(iv) Quite recently, Kapoutsis and Pighizzini |[T6l proved the equivalence between L/poly D NL and 
several other propositions. In particular, they show that L/poly ~D NL is equivalent to the existence 
of a state polynomial simulation of unary 2NFAs by 2dfas. As for (iii), we can observe that 
the only-if condition is stronger than the condition in (ii) and, furthermore, in this case also the 
converse holds. 

We are now go to discussing more into details (ii) and (iv). 



The Graph Accessibility Problem 

A central role in the above mentioned investigations of the relationships between the L versus N L and 
L/poly versus NL questions and the problem of Sakoda and Sipser in the unary case is played by the 
Graph Accessibility Problem (GAP), which is the problem of deciding given directed graph G = (V,E) 
and two fixed vertices s, t G V, whether or not there exists a path from s to t ^ 

It is well known that GAP is an NL-complete problem ||33l . Hence, GAP G NL and, moreover, 
GAP G L if and only if L = NL. In other words, this means that GAP is an hardest problem in NL. As 
we discuss below, the restriction of GAP to a fixed set of vertices represents in some sense (and under a 
suitable encoding) an hardest language for unary 2nfas. 

First of all, in [ 1 Q it was shown how to reduce the language accepted by a unary n-state 2nfa A to 
a graph with N = 0(n) vertices. In other words, given an integer m it is possible to obtain a graph G(m) 
with N vertices such that the unary string a m is accepted by A if and only if G(m) G GAP. Furthermore, 
the reduction can be computed by a finite state transducer of size polynomial in N. 

If L = NL then there is a logspace bounded deterministic machine that solves GAP. By restricting 
this machine to inputs encoding graphs with N vertices, we obtain a finite state automaton Dgap which 
can decide whether or not the graph G(m) resulting from the above reduction is in GAP. By a suitable 
composition of the transducer with Dgap we get a 2dfa B equivalent to the original 2NFA A, with a 
number of states polynomial in n, the number of states of A (see Figure [6]). We address the reader 
to ifTT l for details. In particular we point out that the reduction uses the normal form for unary 2nfas 
presented in Theorem 5. 1 This construction has been extended to outer nondeterministic automata in 181 . 



Furthermore, with a similar technique, it is possible to show that unary 2nfas and 2ofas over any input 
alphabet can be simulated by equivalent unambiguous 2nfas with polynomially many states ll8l[TTi|^] 

It is quite natural to ask if the converse also holds, i.e., if a state polynomial simulation of unary 
2NFAs by 2dfas would imply L = NL. The main problem in trying to prove such a result is related to 
the uniformity. In particular, in ifTTH it is proved even a stronger result, however using the additional hy- 
pothesis that the conversion from unary 2nfas to 2dfas is computed by a logspace bounded transducer. 

On the other hand, it is not difficult to observe that the above described construction works even 
under the weaker hypothesis L/poly D NL, i.e.: 



A polynomial advice is a sequence of strings (a n )„ > 0, such that the length of a(n) is bounded by a polynomial in n. 
Together with an input string x, the machine receives the advice corresponding to the length of x, namely the string 0t(|x|). 

6 As customary, we use GAP also to denote the set of positive instances of the graph accessibility problem. Hence, we 
write G £ GAP if and only if the given directed graph G contains a path connecting two (implicitly) fixed vertices s and t. 

7 These simulations do not require the assumption L = NL. 
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Figure 6: Simulating a unary 2nfa with a 2DFA of polynomial size, under the hypothesis L = NL 



Lemma 6.1 IfL/ poly D NL then the state cost of the simulation of unary 2nfas by 2dfas is polynomial. 

In |[T6l . also the converse of Lemma 6.1 has been proved. The main idea is to exhibit, under the hy- 
pothesis that the state cost of the simulation of unary 2nfas by 2DFAs is polynomial, a logspace bounded 
deterministic machine M which, making use of a polynomial advice, solves the graph accessibility prob- 
lem. This is done by the following steps: 

• A function ( ) ( mapping instances of GAP to unary strings is provided. For each integer n, the 
function ( ) a is a reduction from GAP restricted to graphs with n vertices to a unary language 
UGAP,,. 

• A unary 2nfa A n recognizing UGAP,, with a number of states polynomial in n is described. 

• The automaton A n is replaced by an equivalent 2dfa B n . 

• An instance of GAP can be solved by combining the machine computing the reduction with the 
2dfa B n , where n is the number of vertices in the instance under consideration (hence n depends 
only on the input length), see Figure [7] In particular, the resulting machine M receives the input 
string, which represents a graph G, together with an encoding of the appropriate 2dfa B n , where 
n is the number of vertices of G. If the state cost of the simulation of unary 2nfas by 2DFAs is 
polynomial then B n can be encoded by a string of polynomial length in n. Such encoding is the 
polynomial advice for M. Furthermore, using a suitable encoding for UGAP,, (we sketch some 
ideas below) the workspace used by M can be bounded by a logarithmic function in n. 




Figure 7: The machine M solving GAP using B n as advice 



We are going to describe the encoding ( }, and the languages UGAP,,. 

For each integer n, let V n = {0, 1, . . . ,n — 1} and K n be the complete graph with vertex set V„. With 
each edge (i,j) of V„ we associate a different prime Puj\- To this aim we choose the first n prime 
numbers. 

A graph G = (V n ,E) with n vertices is encoded as the product of all prime powers corresponding to 
the edges in E (see Figure[8]), i.e., by the number 

<g>i = n Pirn 

(i,j)£E 

Conversely, with each integer m we associate the graph K n (m) = (V„,E(m)) such that (i,j) £ E(m) if 
and only if pnj) divides m. It should be clear that K n ((G)^) = G. 
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Figure 8: The complete graph K4 with a subgraph G. The number Puj\ associated with the edge is 
the (i •71+7+ ^) tn prime number. In K4 the edges (i, i) are not depicted. 

We can now define the unary encoding of GAP restricted to graphs with n vertices, as the following 
language: 

UGAP„ = {a m I K n {m) has a path from to n- 1} 

We now describe a 2NFA A n recognizing UGAP„. Roughly speaking, A n implements the standard 
nondeterministic algorithm solving GAP. From a vertex i (starting from i <— at the beginning of the 
computation), A n guesses another vertex j and then it verifies whether (/, j) G E. If this is the case, then 
A n continues the same simulation after making the assignment i <— j, up to reach i = n— 1. However, 
if in a step a pair (1,7) ^ E is reached, then A n hangs and rejects. To check the condition (1,7) G E, A n 
computes the length of its input modulo Puj). 
More into details: 

• A n is outer nondeterministic and sweeping, i.e., it can reverse the input head direction and make 
nondeterministic choices only when the head is scanning one of the endmarkers. Furthermore, in 
each traversal A n counts the input length modulo a prime number. 

• On the endmarkers each state is interpreted either as a copy of a vertex in V n or as an hang state. 

• The automaton can traverse an input a m from one endmarker in a copy of vertex i to the opposite 
endmarker in some copy of vertex j, without visiting the endmarkers in between, if and only if 
the number piu) divides m. In particular, when the automaton is visiting one endmarker in a 
state representing the vertex i, it guesses another vertex j, by entering an appropriate loop where 
it traverses and counts the input modulo pnj). The state in this loop which corresponds to the 
remainder is interpreted as the vertex j of the graph, the other states are interpreted as hang 
states. Hence, when the input head reaches the opposite endmarker, the automaton continues the 
simulation or hangs and rejects depending on the reached state. 

• The computation starts on the left endmarker in a state representing the vertex 0. 

• When a state representing the vertex n — 1 is reached with the head on one of the endmarkers, the 
automaton A n moves to an accepting state and stops the computation. 

Using the properties related to the distribution of prime numbers, it can be proved that the number of 
states of A n is polynomial in n. 

Finally, we have to show that the machine M works in logarithmic space. Actually, we can observe 
that this is not true if we directly implement M as in Figure [7] In fact the length of the unary encoding of 
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a graph with n vertices can be exponential in n. For instance, (K n ) i , the unary encoding of the complete 
graph of n vertices, is the product of first n prime numbers, which is exponential in n. 
This problem is solved as follows: 

• The unary encoding is replaced by a "prime encoding" that, in this case, is a list of all primes 
associated with the edges in the input graph. Hence, the output of the reduction is this listj^] 

• Due to a structural property of 2DFAs (see [231), it is possible to modify the automaton B n , still 
keeping polynomial its number of states, by replacing its unary input tape, with a tape containing 
a prime encoding of the unary input. Hence, after these modifications, the machine M still solves 
GAP. 

• To be stored, the prime encoding would require polynomial space, which is still too much for our 
purposes. To avoid this problem, the prime encoding is not kept in the internal memory of M, but 
it is computed and recomputed "on fly", each time B n needs to access it. This is done by restarting 
the machine that from the input graph G computes the prime encoding. 

Along these lines the converse of Lemma [6T] is proved. This allows to obtain the following: 

Theorem 6.1 L/poly D NL if and only if the state cost of the simulation of unary 2nfas by 2DFAs is 
polynomial. 

We address the reader to lfl6l for the details and for the equivalence of L/poly D NL with several other 
statements. 



7 Concluding Remarks 

We strongly believe that the Sakoda and Sipser question is a very challenging problem which deserves 
further investigation. Several interesting models have been considered and many deep results have been 
obtained in the researches related to this question. As pointed out, connections with space complexity 
have been discovered. This is not limited to the relationships with the question of the power of non- 
determinism in logspace bounded computations. In fact, in more than one case, techniques from space 
complexity turn to be useful to study two-way automata. For instance, the divide-and-conquere technique 



used to prove Theorem 5.2 derives from the proof of the famous Savitch Theorem 11331 . The inductive 
counting tecnique used in fT0| to obtain the polynomial complementation of 2nfas derives from the 
argument used to prove the closure under complementation of nondeterministic space, the famous result 
independently proved in 1988 by Immerman [ 14] and Szelepcsenyi ||37Tl . 

Actually, the complexity theory for finite automata can be developed as a part of standard complexity 
theory for Turing machines, with classes, reductions, complete problems and so on. This approach 
was suggested in the original paper by Sakoda and Sipser ll32l . We recommend the recent paper by 
Kapoutisis ll20l to the interested reader, where the name minicomplexity is suggested for this theory. The 
same author is working to collect and organize in a website all the material and the results in this area, 
see www.minicomplexity.org. 



8 More in general, a prime encoding of a unary string a m is a sequence of the form zi#Z2#- --#Zk where zi,Z2, ■■■,Zk are 
strings encoding in an arbitrary order the prime powers in the factorization of m. 
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